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Abstract 

We briefly review recent progress in techniques for modeling and ana¬ 
lyzing hyperspectral images and movies, in particular for detecting plumes 
of both known and unknown chemicals. For detecting chemicals of known 
spectrum, we extend the technique of using a single subspace for modeling 
the background to a “mixture of subspaces” model to tackle more com¬ 
plicated background. Furthermore, we use partial least squares regression 
on a resampled training set to boost performance. For the detection of 
unknown chemicals we view the problem as an anomaly detection prob¬ 
lem, and use novel estimators with low-sampled complexity for intrinsi¬ 
cally low-dimensional data in high-dimensions that enable us to model 
the “normal” spectra and detect anomalies. We apply these algorithms 
to benchmark data sets made available by the Automated Target Detec¬ 
tion program co-funded by NSF, DTRA and NGA, and compare, when 
applicable, to current state-of-the-art algorithms, with favorable results. 

Keywords. Remote sensing, mixture models, robust modeling chemical 
plumes, automated detection. 


1 Introduction 

The ability to remotely sense and analyze chemical plumes has become increas¬ 
ingly important in many civilian and military applications. Hyperspectral imag¬ 
ing sensors that operate in the long-wave infrared (LWIR) part of the spectrum 
are particularly well suited for these chemical-sensing tasks, because their wide 
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fields of view and high spectral resolutions allow many square kilometers to be 
imaged almost simultaneously and even optically thin chemical clouds to be 
detected. Data collected by these sensors has the form of m x n x p arrays, 
where m, n are the spatial dimensions and p is the spectral dimension. Under 
physically reasonable assumptions and simplifications [1], the following linear 
mixing model is obtained: 


x= y] ftSi+v, (1) 

i<i<Ar 

where x G represents the radiance spectrum in the scene, N the number of 
chemicals of interest, G the signature spectrum for the i-th chemical of 
interest, gi > 0 the amount of chemical i, and v the radiance spectrum of the 
background. In practice, it is usually assumed that N < 3. In this paper, due 
to the data under consideration, we consider the case N = 1 (thus, x = ps + v). 
However, the methods discussed in this paper can be naturally extended to the 
cases where N > 1. 

To effectively separate the chemical clouds from the background clutter, 
one needs to choose a proper model for the background radiation v. Current 
approaches (e.g. mm) often represent the background by a single Gaussian dis¬ 
tribution or subspace and then derive corresponding statistical estimators which 
assign detection scores to each spectrum (see the review [B). These detection 
algorithms have shown their effectiveness in many applications, but there is 
room for improvement. In particular, background often consists of heteroge¬ 
neous regions (such as sky, mountain, desert; see e.g. [1] Fig. 9]) which may 
require separate subspaces to better capture the complexity of the background. 
Some other methods tackle the problem alternatively 013012 , including the 
case when the target plume is unknown [8] . 

Our contributions in this work are threefold: (a) we extend the single¬ 
subspace model [3] for modeling the background to a “mixture of subspaces” 
model; (b) we propose techniques to enhance the detectability of the chemical 
plume region, moving beyond classical least squares and using resampling tech¬ 
niques to boost detection; and (c) when detection of chemicals with unknown 
signature is of interest, we propose a flexible anomaly detection procedure which 
efficiently constructs an empirical model of “normal” spectra and flags anoma¬ 
lous ones according to likelihood assigned by the empirical model. In particular, 
(c) is achieved by applying an efficient multiscale transform [9] to the hyper spec¬ 
tral spectra of a training frame and then model the density of the transformed 
pixels for computing the likelihood of each spectrum in any testing frame: if the 
testing frame contains gas, then the gas region will be flagged as an anomaly if 
it is assigned low likelihood values. 

We consider two different scenarios, depending on whether we are given only 
a single hyperspectral cube or a time series of them. In Scenario (I), we use the 
available cube both for learning an empirical model for the background and for 
chemical detection, while in Scenario (II) we use the first few frames (assumed 
to be without chemical plume) for background modeling and any subsequent 
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frame for testing for anomalies based on the learned background model. If 
the sensor is assumed stationary, one could use the temporal variation of the 
spectrum at each location to detect changes and detect anomalies: we do not 
assume that the sensor is stationary (albeit it is such in most of the data we 
consider) and therefore we do not use spatial coherence information across the 
images. This makes the problem harder, but the proposed solution applicable 
in a wider variety of settings. 

2 Previous Work for Known Target Signature 

Based on different assumptions on the background radiation v, different detec¬ 
tion algorithms have been proposed for the setting when the signature of the 
target plume is known. In this section we review some of the existing approaches 
following the presentation of [T]. 

2.1 Gaussian Models 

The simplest and most practical algorithm for chemical gas detection uses the 
linear model in 0 and assumes normally distributed background clutter with 
known covariance, that is, 

X = + V, V ^ 

where the plume spectral signature s is also assumed to be known. The coef¬ 
ficients g and can be estimated by maximizing the likelihood from a given 
sample. The gas detection problem can be formalized as that of testing the 
hypotheses 

Ho : Plume absent (^ = 0); Hi : Plume present (^ > 0). 

By applying the generalized likelihood ratio test (GLRT) one obtains the fol¬ 
lowing detector: 


^nmf(x I /i, H, s) = 


(s^S-is)(5c^S-i5c)’ 


X = X — /i 


( 2 ) 


This is known as the normalized matched filter (NMF) and also the adaptive 
cosine or coherence estimator (ACE) [2]. Since the NMF requires estimation 
and inversion of the background covariance matrix E, they are approximated in 
the following way in order to avoid numerical instability and to obtain robust 
detectors: 

p 

t, = Y^XkCikCil + SI, = J2 \ + 

k=l k=l ^ 

where and q/^ {k = I,-- - ,p) are the eigenvalues and eigenvectors of the 
sample variance, is a regularization parameter and can be chosen as certain 
percentile of the values of {Xk}- For the data of our interest in this paper, the 
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performance of ACE is not sensitive to the choice of S as long as it is moderate, 
say between 30 and 80 percentiles of {A/c}. In the experiments of this paper, we 
choose S as the median of {A/c}. 

2.2 Subspace Models 

Another category of approaches for chemical plume detection assumes that the 
background clutter can be well represented by a low-dimensional subspace. In 
that case, the signal model becomes 

X = sg -h ju -h Be + e, e ^ Ar(0, (j‘^1) 

where /i, B represent a fixed point and a basis for the background subspace 
and e is the additive noise. To find /i,5, principal component analysis (PCA) 
is applied to the background spectra. Similarly to before, the quantities g, c 
and a are estimated by maximizing the likelihood. Due to the assumption of 
normally distributed error e, the maximum likelihood estimators (MLE) of g 
and c may be computed by least squares. 

The GLRT approach yields the following detector [3]: 

r„ss(x|ftB,s)= j|||^ (3) 

where and are projection matrices: 

P^ = I - B{B^B)-^B^; 

P,i = I-A{A^A)-^A^, A=[sB]. 

Note that ||P^-^x|| and ||P^-^x|| are respectively the orthogonal distances from the 
background and target-background subspaces. 


2.3 Multiple Plumes 

These algorithms can be naturally extended to the case where N > i.e., there 

are more than one chemical plumes. The corresponding detectors can be written 
as: 


Tnmf(x I /i, E, S) = - x^E~^x - ’ X = X 

where P = [si, • • • , sat] and Si G W is the signature spectrum for the 


chem¬ 


ical plume, and 

Jnss(x I 

where and are projection matrices: 

P^ = I - 

Pti = I - AiA^Ay^A^, A = [SB]. 
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3 Mixture models and Enhancement with Known 
Target Signature 

In this section, we describe both the mixture models and some techniques to 
enhance the detection. Our algorithm is presented at the end of this section. 

3.1 Mixture Models 

Gaussian mixture models v ^ 7rjAf{iij, Xlj), where Af represents the Normal 
distribution, may be used to capture complex backgrounds. We are particularly 
interested in the case where is rank-deficient, and therefore is 

supported on an affine subspace spanned by cols(5j), the columns of a matrix 
Bj. More generally, in a subspace mixture model v ^ 7rjS{fij, Bj) where 
S^jij^Bj) is a probability measure with mean jXj and support contained in the 
subspace spanned by the columns of Bj. 

The model parameters tt^ and Gj ((/ij, ^j) or {jUj^Bj)) in each case can be 
estimated by iC-means, iC-means-like subspace clustering algorithms (e.g. nni 
mm), fast multiscale techniques m, or Expectation-Maximization (EM) 
methods, through iterations between updating cluster assignments and model 
parameters. 

The signal x is then assigned to the cluster that maximizes the estimation 
of 7rjp(x|0j): 

j = argmax7rjp(x | Qj). 
j 

Given a target plume signature s, the mixture versions of the two estimators, 
NME (also known as AGE) and NSS, are given by 

^mixNMF(x I S, {TTj, 0j}) = Tnmf(x | S,/ipE;:.); 

(4) 

^mixNSs(x I S, {tT^, 0j}) = Tnss(x | S,/ij.,5j.). 

Alternatively, for the subspace mixture model, we may simply use the coef¬ 
ficient g as the detector and solve for it by least squares. Specifically, letting j 
be defined as above, it is easy to show that the least squares estimator ^ of is 
the first entry of 

/3 = (4A3.)-^4(x-m3.), ^j = [sS3.]. 

This yields the mixture Linear Goefhcient (LG) estimator 


^mixLC(x I s, ^Trjf, 0ji}) — max|^,0}. 

The knowledge about the background complexity can be used to choose the 
appropriate number of Gaussians (or subspaces), e.g., for the data in Section 
we use three components since the scene has sky, mountain and ground areas. 
In practice, the detection performance is not sensitive when this number is 
overestimated and there are enough sample spectra. These mixture models can 
also be applied with the detectors that handle multiple plumes. 


5.2 
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Finally, in section we consider a related but different family of multiscale 
models for background spectra, in the context of anomaly detection in hyper- 
spectral movies, in the setting when the spectral signature s of the chemical of 
interest is not given, nor the frames in the movies are assumed to be registered. 

3.2 Enhancement Techniques 

We propose a few enhancement techniques for background estimation to reduce 
the affects of the contamination: 

3.2.1 Outlier Removal 

We detect and remove a small fraction of the spectra that are dissimilar to the 
main part of the data, in terms of magnitude or connectivity (in this paper 
we simply compute the sum of squares of each spectrum as its magnitude and 
classify those spectra with largest magnitudes as outliers). 

3.2.2 Resampling Enhancement 

This technique is relevant only when we are in Scenario (I). For this goal we 
utilize an iterative scheme. We first choose a few likely background spectra 
based on a reliable detection score (output of a detection algorithm, e.g. ACE), 
and then select their spatial neighbors as well, since adjacent pixels are very 
likely to be of the same category. Using these selected spectra, the background 
model parameters are re-estimated and the detection statistics are re-deduced 
accordingly. 


Algorithm 1 Resampling Enhancement 

Input: C spectra; C M: detection score; s G plume 

signature; ri G (0,1): portion. 

Output: ‘E M: enhanced detection score. 

1: Sort Ti s.t., T( 1 ) < 

2: Choose (5i = Let Ai := {x* : Tj < (5i}. 

3: Let B be the union of Ai and the 4 (above, below, left and right) spatial 
neighbors of Xi G A. 

4: Re-estimate model parameters 0j’s from B. 

5: Compute the statistic (detection score) based on the updated model. 


3.2.3 Partial Least Squares Regression (PLSR) Enhancement 

PLSR [H reduces the dimensionality of the data in the way that the covariance 
between predictors and responses is maximized. The response is re-estimated 
on the reduced predictors. In our problem, PLSR is applied so that the ra¬ 
diance data is projected to a subspace that is most relevant to the detection 
score (output of a detection algorithm, or enhanced detection score resulted 
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from resampling). A new and enhanced detection score is computed from the 
projected radiance data. In fact, we select a certain amount of spectra which 
are most likely to be background as well as chemical clouds and apply PLSR on 
the selections. 


Algorithm 2 PLSR Enhancement 

Input: C spectra; C M: detection score; s G plume 

signature; r 2 , rs G ( 0 , 1 ): portions. 

Output: M: enhanced detection score. 

1 : Sort Ti s.t., T( 1 ) < 

2 : Choose 62 = T(^[T 2 mn]) and S 3 = Let A 2 := {xj : Ti < 62 } and 

^3 := {xi : Ti > < 53 }. 

3: Apply PLSR on the selected pairs ((Ki^Ti) with G A 2 IJA 3 to obtain 
parameters /3 eW and /Sq G M. 

4: yi = /3^Xi + /3o, i = 1, • • • , mn. 


3.3 Algorithm for Plume Detection in Hyperspectral Im¬ 
ages or Movies 

We propose a unifying algorithm (see Algorithm. that can detect chemi¬ 
cal plumes in both scenarios: in Scenario (I) when we have only a single hy¬ 
perspectral cube, we incorporate the enhancement techniques proposed in the 
previous section to learn the background via mixture modeling; in Scenario 
(II) where we are given a time series of hyperspectral cubes, we assume that 
the first few frames were collected before the chemical release so we may use 
their spectra for background modeling. Afterwards, we select a detector (from 
^mixNMF,^mixNSS,^mixLc)) and apply it to the given cube(s). 


Algorithm 3 Chemical plume detection through mixture background modeling 
in hyperspectral images or movies 

luput: {x-^^}^^,l < i < L: L hyperspectral frame(s); s G W: plume signa¬ 
ture, and detector (one of the TmixNMF, ^mixNSS, ^mixLc)- 
Output: C M: detection score 

1 : Fit a mixture model {Qj} to the given cube (when L = 1 ) or the first few 
clean frames (when L > 1 ). Apply the enhancement techniques if relevant. 
2 : for each frame ^ = 1 ,..., L, 

(1) Assign spectra of the i-th. frame to nearest component models by maxi¬ 
mizing the estimator of 7 rjp(x| 0 j) 

( 2 ) Evaluate the given detector for all spectra of the frame (within the 
corresponding clusters) to produce detection scores 

eud for 
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4 Anomaly Detection in hyperspectral movies 
with Unknown Target Signature 

In this section we consider the problem of anomaly detection instead of detec¬ 
tion of known chemicals. In particular, we are interested in the application to 
hyperspectral movies, where at unknown time and location a gas is released 
and the gas plume needs to be tracked. We think of the chemical plume re¬ 
gion, once released, as a set of anomalous spectra, when compared against the 
background clutter, and thus we base detection on accurately estimating the 
probability measure modeling the space of the background spectra and comput¬ 
ing the likelihood scores of every spectrum in any testing frame relative to the 
density model. 

In the original paper on Geometric Multi-Resolution Analysis (GMRA) [9], 
an approach for approximating probability distributions in high-dimensions us¬ 
ing the intrinsically low-dimensional GMRA structure was suggested, and those 
ideas were further developed in Ha IT]. In this case we are not given sig¬ 
natures of spectra to detect; instead we are given one or more hyperspectral 
scenes defined “normal” (a training set), and given a new hyperspectral scene 
we are interested in deciding if its spectra are normal or present “anomalies”. 
We model this problem as follows: we assume that there is an unknown prob¬ 
ability measure v in from which “normal” spectra are drawn. The training 
set X := {x^}A ^ c consisting of all the spectra in the training hyperspec¬ 
tral scenes is modeled as n i.i.d. samples from z/Q We use these n samples to 
learn a probability measure z>x approximating, in a suitable sense, u. Given a 
new scene, i.e. a new set of samples we could ask what is the 

probability of seeing x^®^ if it was sampled from z^, and call x^®^ an anomaly if 
this probability is below a certain threshold. Unfortunately this does not make 
sense since typically u (and our estimator Ox) do not assign positive probabil¬ 
ity to any point. Often one then replaces probabilities by probability densities 
and associated likelihoods. An alternative is to replace the question above by 
evaluating the probability (according to Ox) of seeing a point within distance r 
from x^®^, and decide whether x^®^ is an anomaly or not based on a threshold 
on such probability, i.e. we declare x^®^ an anomaly if (5^(x^®^)) < 77 . 
The values of 77 and r tune the sensitivity of the anomaly detection. We may 
choose them by first fixing 77 , then choosing r minimizing type I or type II er¬ 
ror (or other similar criterion), or by choosing r to be smallest value such that 
(5^(x)'^^)) > 77 for all i’s, where is a validation data set (possibly 

extracted and excluded from a training set). Then as we vary 77 we obtain a 
ROG for the anomaly detector (assuming we know the ground truth). 

The key problem here is to efficiently construct an estimator Ox- this is a 
challenging task, since u is in with p typically large {p > 100 is common). 
This is accomplished by using techniques based on GMRA, originally suggested 
in [ 9 ], and further developed and analyzed in [T3j [18]. We have no space to 

^Clearly, independence is a rather strong assumption, but could be relaxed with only rather 
minor technical difficulties to more general settings that accommodate mild dependencies. 



describe the details of these constructions. At a high level, one can think of 
GMRA as a principled and efficient way of encoding a data set that, while high- 
dimensional, is intrinsically low-dimensional, at different levels of accuracy, by 
first constructing a multiscale tree decomposition of the data, and then in each 
node of the tree, corresponding to a portion of the data, construct a data- 
driven low-dimensional projection of the data, and further compressing these 
projections by encoding the difference between the projection at one node and 
those of children nodes in the tree [ 9 ] . The results in [ 18 ] guarantee that under 
suitable geometric assumptions about the data - essentially the data should 
be close to an (unknown) low-dimensional manifold - GMRA run on the data 
will efficiently construct sparse representations of the data. In order to extend 
this construction to the estimation of probability measure, at each node of the 
GMRA tree we estimate a simple probability measure (e.g. a low-rank Gaussian) 
and combine these measures appropriately to construct a probability measure 
i)x approximating v. The results in mun] guarantee that - under suitable 
geometric assumptions on the data and on the regularity of vx - the estimator 
gets increasingly closer to v (in Wasserstein distance) with high probability 
as N increases, and with a rate that depends only on the intrinsic dimension d of 
the data and not on the ambient dimension p. Furthermore, these constructions 
are yielded by efficient algorithms, having complexity essentially linear in the 
training data size pN, with a constant depending essentially on the intrinsic 
dimension of the data d. In the setting of hyperspectral images considered here, 
N = mn is the number of pixels in an image. The intrinsic dimension of the 
data, measured by Multiscale SVD nani] is often very small in hyperspectral 
data, between I and 5 . Moreover, these algorithms m-m easily allow to 
quickly (in 0^(logA/')) incorporate new samples in an online fashion, in both 
the GMRA construction and the estimator z>x. Finally, the underlying GMRA 
construction, simultaneously to the above, yields a dictionary that sparsifies 
the data (in our case the spectra) [181 ED? enabling the compression of the 
data. These representations also lead to a variation of Gompressed Sensing 
where the data model is that of a nonlinear manifold, with extremely efficient 
inversion algorithms that do not require the solution of high-dimensional convex 
optimization problems m- 

5 Experimental Results 

In this section we consider both synthetic data and some real data sets in the 
Golorado State Repository (will write GSR for short thereafter) made available 
through the ATD program {Algorithms for Threat Detection)^ co-sponsored by 
NSF, DTRA and NGA. We start with synthetic data to present the functionality 
of our models in simple situations, and subsequently consider real data sets, 
demonstrating the effectiveness of our methods in applications. 
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Algorithm 4 Multiscale-transform based Density Estimation 

Input: Data set 

Output: Multiscale densities 

1 : Apply GMRA to the training data to obtain a multiscale dictionary 

2 : For each j > 0, /c G A^, apply the Geometric Wavelet Transform to the data 
in each Cj^k and obtain low dimensional coefficients 

3: Apply a density estimator (e.g. KDE m) to each set of geometric scaling 
and wavelet coefficients corresponding to each Cj^k^ and obtain density es¬ 
timates also record ^j^ki fhe empirical probability of a point belonging 
to Cj^k' 

4: By testing on a validation set, select an optimal scale j* and return the 
mixture model 


5.1 Synthetic Data Sets 

5.1.1 Gaussiau Mixture Models 

Description. We use synthetic data set to realize the Gaussian mixture model. 
We take the MIT Lincoln Lab Ghallenge Data (see details in the next section) to 
obtain the mean spectra for three regions - sky, mountain and ground (denoted 
by /ii, /i 2 and fis respectively) and the target plume signature s. These spectra 
consist of 68 measurements at different values of wavelength and are shown in 
Figure 



Figure 1: Left: the mean spectra of sky, mountain and ground. Right: the 
absorption of the chemical plume. 

Moreover, 5, 000, 5,000 and 4, 000 spectra are generated i.i.d. from Gaussian 
distributions A/’(/ii,Fi), and A/’(/i 3 ,F 3 ) respectively for the three 

regions, where = diag{crf ^^5 ’'' ? i = 1, 2, 3 and all cr^j’s are drawn i.i.d. 

from the uniform distribution on [0.002a, 0.008a] with a = max(/i 3 ). Finally, 
1,000 spectra on the ground with chemical plume are generated from gs + v, 
where v ^ A/’(/i 3 , F 3 ) and g ^ A/'(— 0.01, 0.001). Top left of Figure [^displays 5 
sample spectra for each of the groups - sky, mountain, ground and plume. 
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Figure 2: Top left: radiance against wavenumber of 5 samples each from the 
groups of sky, mountain, ground and plume. Top right: the ACE scores of 10 
spectra samples computed by 1 Gaussian and 3 Gaussians respectively. Bottom: 
the boxplot of the AGE scores of using 1 Gaussian (left) and 3 Gaussians (right). 
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Task . Detect the plume and compare the Gaussian mixture model with the 
single Gaussian model by the detection. 

Technique. We compute the AGE (or NMF) statistics (detection scores) using 
the ground truth. More precisely, we compute TmixNMF by @ using the true 
values of /i^ and with i = 1, 2, 3 and Tnmf by (§ using /i = (/ii+/i2+/i3)/3 
and E = (El + E 2 + E3)/3 for all spectra. 

Results . Ten sample AGE statistics each from the 4 groups are demonstrated 
on the top right of Figure for both the single Gaussian and mixture Gaussian 
models. In addition, the summary of these statistics are shown as boxplot at 
the bottom of Figure with the single Gaussian on the left and the mixture 
Gaussian on the right. The boxes are colored by groups of sky, mountain, 
ground and plume. On each box, the central mark is the median, the edges 
of the box are the 25th and 75th percentiles, the whiskers extend to the most 
extreme data points not considered as outliers. These figures show that when 
using three Gaussians instead of a single Gaussian, the AGE statistics of the 
plumes are greatly larger than those of the other regions. This makes the plumes 
more separable and thus more detectable. 

5.1.2 Subspace Mixture Models 

Description. We use synthetic data set to realize the subspace mixture model. 
5,000, 5,000 and 4,000 spectra are generated i.i.d. from c/ii + e, c/i 2 + ^ and 
c/i 3 + e respectively for sky, mountain and ground regions, where c ^ A7(l, 0.01), 
e ~ A/'(0,I]e), Se = diag{(7^,-- - and CTg j’s are drawn i.i.d. from 

A/’(0, 0.005 max(/i 3 )). Then 1,000 spectra on the ground with plume are gener¬ 
ated as gs + c/i 3 + e with g ^ A7(—0.01,0.001). Top left of Figure [^displays 5 
sample spectra for each of the groups - sky, mountain, ground and plume. 

Task . Detect the plume and compare the subspace mixture model with the 
single subspace model by the detection. 

Technique. We compute the NSS statistics (detection scores) using the ground 
truth. More precisely, we compute TmixNSS by Q letting = /i^, i = 1, 2, 3 
and Tnss by using B =Ui+ ii 2 + A<3)/3. 

Results . Ten sample NSS statistics each from the 4 groups are demonstrated 
on the top right of Figure for both the single subspace and mixture subspace 
models. In addition, the summary of these statistics are shown as boxplot at 
the bottom of Figure with the single subspace on the left and the mixture 
subspace on the right. From these figures, we know that the separation between 
the plume and other regions greatly improves when using the mixture subspace 
model. 


5.1.3 Partial Least Square Regression 

Description. We generate 10, 000 spectra from /i+6e, where g = (/ii+/i2+M3)/3 
, b = max/i and ej ^ Poisson(0.005), i.i.d., j = 1, • • • ,68. Then 1,000 spectra 
are generated from g be gs with g ^ A7(—0.01, 0.001). 10 sample spectra of 
each class (scene and plume) are shown in Figure]^ 
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Figure 3: Top left: radiance against wavenumber of 5 samples each from the 
groups of sky, mountain, ground and plume. Top right: the NSS scores of 10 
samples computed by 1 subspace and 3 subspaces respectively. Bottom: the 
boxplot of the NSS scores of using 1 Gaussian (left) and 3 Gaussians (right). 
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Figure 5: Top left: ACE statistics of 50 sample spectra of each class; Top right: 
ACE statistics after PLSR of the same 50 sample spectra of each class; Bottom 
left: the boxplot of the ACE statistics; Bottom right: the boxplot of the ACE 
statistics with PLSR. 


Task . Detect the plume and compare the performances before and after PLSR. 
Technique. We compute the ACE statistics as described in Section Then we 
apply Algorithm with T 2 = T 3 = 0.2 to the computed ACE scores. 

Results . The top of Eigurej^ displays the 50 sample ACE scores on the left and 
the corresponding enhanced results after applying PLSR on the right. Their 
statistical summary are demonstrated as boxplots at the bottom of Eigure 
with ACE on the left and PLSR on the right. These figures demonstrate that 
the PLSR procedure greatly improves the separation of the detection scores 
between scene and plume. 


5.1.4 Multiple Chemical Plumes 


Description. We follow Section 


5.1.1 


to generate 5, 000 sample spectra each for 


the sky, mountain and ground areas, then we generate 1,000 sample spectra 
for the ground area each with chemical plume 1 and 2. The signature (si,S 2 ) 
of these two plumes are demonstrated on the top left of Eigure Einally we 
generate 100 sample spectra for the ground area with both chemical plumes 
from ^iSi +^282 + V, where v ^ A/’(/i 3 ,E 3 ) and ^ 1,^2 ^ A/'(— 0 . 01 , 0 . 001 ). On 
the top right of Eigure sample spectra of different areas are displayed. 

Task . Detect where a plume (or both plumes) are present. 

Technique. We compute the detectors as described in Section 12.3 
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Figure 6: Top left: signature spectra of plume 1 and plume 2. Top right: sample 
spectra of different areas, with and without plumes. Bottom left: boxplot of the 
ACE statistics for different areas. Bottom right: boxplot of the NSS statistics 
for different areas. 


Results . The box plots of the NMF (ACE) statistics and the NSS statistics are 
shown at the bottom of Figure It is evident that the regions with plumes 
can be distinguished from those without plumes very well. We will not conduct 
an experimental analysis as careful as for a single target plume for two reasons. 
First, all the real data under consideration have only a single chemical plume. 
Second, from the statistical point of view, our models (both Gaussian and sub¬ 
space) for different numbers of chemical plumes are essentially the same models 
with different dimensions. We will just use this simple example in this section 
to justify that our methods work effectively for A" > 1. 

5.2 MIT Lincoln Lab Challenge Data 

Description. We use hyperspectral images available at the CSR, collected by the 
MIT Lincoln Lab, and made available through the ATD program. This data set 
consists of four individual hyperspectral images, two of which are for released 
chemical plume and two for embedded plume. In each of the four, available 
data include a radiance data cube, its matrix form, the absorption coefficient 
spectrum of the chemical plume of interest and plume present mask. Data cubes 
are about of the size 200 x 300 x 100, where 200 x 300 is the spatial size and 
100 is the spectral dimension. 

Task . Detect the chemical plume from a single cube. 

Technique. We apply the three detection algorithms (NMF, NSS, LC), their 
corresponding mixture models (mixNMF, mixNSS, mixLC), and the two en¬ 
hancement techniques (resampling and PLSR) as described in Section and 
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The dimension for the subspaces is chosen to be 2 by looking at where the sin¬ 
gular values start to flatten. The parameters for the enhancement techniques 
are ri = 0.2, r 2 = rs = 0.15. 

Results . The receiver operating characteristic (ROC) curves are shown in Fig¬ 
ure for one of the embedded data cube (also the most difficult one). Com¬ 
parisons are made between single and mixture models on the left and between 
mixture models and those followed by resampling and then PLSR (the combi¬ 
nation of these two enhancement techniques are denoted as -eh) on the right. 
From the left figure, we see that using mixture models for background can im¬ 
prove the detection results. Among the three algorithms, mixture LC works 
the best. In the right figure, when adding the enhancement procedures, both 
mixNMF and mixNSS improve, but mixLC does not. Overall, mixNMF with 
enhancement outperforms others. 
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Figure 7: ROC curves. The left is comparison for single and mixture models and 
the right for mixture models before and after enhancement (denoted as -eh). 

Furthermore, we demonstrate the detection maps of the intermediate steps 
of the best performed method mixNMF-e/i, namely mixNMF, mixNMF with re¬ 
sampling (mixNMF-rs), mixNMF with resampling twice (mixNMF-rs2), mixNMF 
with resampling twice and followed by PLSR (mixNMF-rs2-plsr). These results 
are shown with those of the NMF on the original data (NMF-outliers) and on 
the data with outliers removed (NMF). All the methods stated in this section 
proceed the outlier removal first, except NMF-outliers. Figureshows that the 
mixture model and the various enhancement steps (resampling and PLSR) im¬ 
prove the results gradually; the plume region becomes more and more separable 
over these steps. 

5.3 Fabry-Perot Interferometer Sensor Data 

Deseription. This dataset consists of five time series of radiance data, collected 
using an imaging Spectroradiometer that operates in the 8-11 micron range 
under five different combinations of the kind of chemical material (TEP, or 
MeS, or GAA), the release amount (75 kg/burst or 150), and the sensor used. 
To generate each sequence, a predetermined quantity of simulant is released into 
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Figure 8: From top to bottom and from left to right: the detections map of 
NMF-outlier, MNF, mixNMF, mixNMF-rs, mixNMF-rs2 and mixNMF-rs2-plsr. 
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Figure 9: Detection results by ACE (left column) and TmixLC (right column) on 
a fixed frame in each of the five hyperspectral movies of the Fabry-Perot data 
set. Note that (1) our detection map is consistently cleaner than ACE, and (2) 
our algorithm significantly outperforms^ ACE in the third movie. 







the troposphere by using explosives (i.e. a burst release). Successive data cubes 
are collected, beginning before the event and terminating when it is deemed 
that additional data collected will not augment scientific value. As a result, 
each hyperspectral time series contains several hundred frames of hyperspectral 
images, which all have 256 x 256 spectra along 20 spectral bands. 

Task . Detect and track the chemical plume. 

Technique. We first check the time derivatives of the spectra (along the se¬ 
quence) but did not observe anything useful (this indicates that the data is 
particularly challenging). We then applied Algorithm. with the mixLC esti¬ 
mator and compared our results with ACE [2]. 

Results . The comparison is shown in Figure It is obvious that the detection 
map by our algorithm is always much cleaner than that by ACE. Furthermore, 
in one case, our algorithm was able to clearly locate the plume while ACE could 
not (see Figure]^ third column). In this experiment, we used the 5th frame for 
background modeling (via a union of lines) and the 35th frame for testing (for all 
five movies); future work will utilize multiple clean frames for joint background 
modeling. 


5.4 Anomaly Detection in Hyperspectral Movies 

Description. We use hyperspectral movies available at CSR, collected by Johns 
Hopkins Applied Physics Lab, and made available through the NSF ATD pro¬ 
gram. These movies are taken with an FTIR based long wave infrared sen¬ 
sor, recording a variety of releases of known chemicals in a gaseous, liquid and 
gaseous state. They have frames consisting of 128 x 320 x 120 hyperspectral 
cubes, with one collected approximately every 8 seconds. The data consists of 
a desert scene where an unknown (to us) chemical is released at an unknown 
location at an unknown time (after the beginning of the movie), growing into a 
chemical plume. 

Task . Detect and track the chemical plume. 

Technique. We detect the spectra in the chemical plume as an anomaly with 
respect to an empirical model for the background. We use the first two frames of 
the movies (where we are told that no chemical plum is present) as our training 
set X, and use the techniques detailed in section]^ to construct z>x, and detect 
anomalies. 

Results . An example of anomaly detection is visualized in Figure 10 In the 


context of these movies, the intrinsic dimension of the data d appears to be 
very low, typically d < 5, as measured by Multiscale SVD m-m- Therefore 
the number of samples required in order to learn Ox is expected to be low, even 
if the high-dimensional space R^ with p = 120. In practice, with non-optimized 
Matlab code, the construction of z>x, using the first frame as a training set 
(a much smaller set would be more than sufficient) takes about a minute, and 
the evaluation of Ox at all the spectra of a new frame takes a few seconds. 
The anomaly detection requires the choice of a threshold, but not knowing the 
ground truth we cannot study ROC curves. However the choice of threshold 
did not seem to affect the results much, especially in the detection of the “core” 
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part of the chemical plume, albeit it did affect the detection of (presumably 
less dense) regions of the plume: see Figure 10, where we chose on purpose a 
particularly conservative threshold. 


6 Computational Complexity 

For the algorithms in section the algorithm m to estimate the subspaces 
modeling background has computational complexity is 0 (cimn(dp+C 2 +log(mn))), 
where d is the intrinsic dimension of the subspaces, ci and C 2 are two parameters 
(set to 10 and 20). Algorithm [12] has similar cost and performance. Construct¬ 
ing the estimators TmixNMF, FmixNSS , F^ixLC requires O(p^), 0((p + d)d?) and 
0((p + d)d?) basic operations. Resampling has cost 0(mn log(mn)), PLSR re¬ 
quires 0{{pmn + +p/)/) operations, with I the dimension chosen for PLSR 

(typically 0(1)^ independently of p). The cost of the GMRA-based algorithms 
(see section 0 is: 0 {CdTnnp\og{mn)d‘^), where Cd is a constant that depends 
exponentially in the intrinsic dimension d of the subspaces, for constructing the 
GMRA; of 0{mnd?) for constructing the estimator using low-rank Gaussians 
or KDE, and for evaluating it at new points; 0{mnp\og{mn)) for computing 
the coefficients of new data needed to evaluate the likelihood. 

In summary, all the algorithms we discuss run in time proportional to (up to 
logarithmic factors) the size mnp of the hyperspectral data cube, with constants 
that depend on the intrinsic dimension d of the data. 

7 Conclusion 

We have presented several ideas aimed at improving the current state-of-art in 
several tasks related to the analysis of hyperspectral images, in particular for 
background modeling, chemical plume detection and anomaly detection. We 
discussed the application of these algorithms to a variety of data sets, with 
state-of-art or better results (when ground truth was available). The proposed 
techniques are diverse, but are mostly motivated by the observation that hy¬ 
perspectral data is often noisy but intrinsically low-dimensional, allowing one 
to use ideas from the dimension reduction and manifold learning algorithms 
originally considered in view of machine learning applications. In particular 
we use techniques for approximating data by mixtures of distributions on low¬ 
dimensional subspaces uni [mill], first with a small, fixed number of subspaces 
for background modeling, and then with more complex, multiscale mixtures of 
subspaces using GMRA [9] and its extensions to the estimate of probability 
measures in high-dimensions. 
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Figure 10: Anomaly detection of a chemical plume of an unknown chemical 
appearing at an unknown time and location in a hyperspectral movie. No spatial 
registration among the frames is assumed. The chemical plume is detected as 
anomaly, evolving in time and space. Left column: log-likelihood score according 
to empirical GMRA-based model, with darker red meaning lower log-likelihood, 
i.e. higher probability of being an anomaly. Right column: thresholded version 
of the images on the left, with a non-optimized, conservative threshold, showing 
detection of the chemical plume. Ground truth was not provided for this data 
set. 
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